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Abstract 


In  the  sixth  quarter  of  the  work  effort,  we  focused  on  a)  conducting  experiments  on  real-world 
data  sets  using  the  developed  algorithms,  b)  design/implementation  of  the  Multiscale  Singular 
Value  Decomposition  (SVD)  algorithm  and  c)  fine  tuning  and  bug  fixes  for  the  randomized  SVD 
and  ANN  algorithms.  This  report  documents  the  current  variants  of  the  Multiscale  SVD 
algorithms  under  development. 

The  project  is  currently  on  track  -  in  the  upcoming  quarters,  we  will  continue  applying  the 
developed  algorithms  to  various  data  sets  and  continue  improving  the  multiscale  SVD 
algorithms.  No  problems  are  currently  anticipated. 
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2  Summary 


In  this  quarter,  we  continued  design  of  the  new  multiscale  SVD  algorithms.  Developement  of  the 
algorithms  is  underway. 

The  project  is  currently  on  track  -  in  the  upcoming  quarters,  we  will  continue  applying  the 
developed  algorithms  to  various  data  sets  and  continue  improving  the  multiscale  SVD 
algorithms.  No  problems  are  currently  anticipated. 
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3  Introduction 


The  primary  project  effort  over  the  last  quarter  focused  on  completing  the  design  of  the 
multiscale  SVD  algorithms  [1],  Descriptions  of  the  multiscale  SVD  algorithms  are  provided  in 
Section  4. 
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4  Methods,  Assumptions  and  Procedures 


4.1  Multiscale  Singular  Value  Decomposition 

The  Singular  Value  Decomposition  (SVD)  [2]  is  a  fundamental  tool  in  linear  algebra  which 
provides  a  factorization  of  any  real  or  complex  matrix.  It  provides  complete  spectral  information 
for  any  linear  operator.  Given  an  mxn  matrix  A  of  rank  k  <  min  (m,n),  the  SVD  represents  A  in 
the  form 


A  =  U  °  D  °  V* 


where  D  is  a  k  x  k  diagonal  matrix  whose  elements  are  non-negative,  and  U  and  V  are  matrices 
(of  sizes  m  x  k  and  n  x  k,  respectively)  whose  columns  are  orthonormal.  The  compression 
provided  by  the  SVD  is  optimal  in  terms  of  accuracy  [3],  and  has  a  simple  geometric 
interpretation:  it  expresses  each  of  the  columns  of  A  as  a  linear  combination  of  the  k 
(orthonormal)  columns  of  U\  it  also  represents  the  rows  of  A  as  linear  combinations  of 
(orthonormal)  rows  of  V;  and  the  matrices  U,  V  are  chosen  in  such  a  manner  that  the  rows  of  U 
are  images  (up  to  a  scaling)  under  A  of  the  columns  of  V. 

However,  for  any  given  data  set  of  observed  points,  the  SVD  may  not  necessarily  be  locally 
optimal.  This  problem  discussed  in  detail  in  the  earlier  technical  report  ISRN  TELCORDIA— 
2011-04+PR-0GARAU.  The  Multiscale  SVD  (MSVD)  addresses  this  by  providing  a  spectral 
readout  at  all  scales.  The  algorithms  for  both  small  and  high  dimensions  are  described  below. 

4.1.1  Small  Number  of  Dimensions  (less  than  or  equal  to  3) 

Here,  we  consider  multivariate  data  streams  that  are  tagged  using  a  small  number  of  dimensions. 
An  example  would  be  multivariate  time-series  data  comprising  readings  from  several  sensors. 
Note  that  while  the  number  of  sensors  may  be  very  large,  after  registration,  each  “observed” 
vector,  of  possibly  high  dimensionality,  is  associated  with  a  scalar  variable  -  in  this  case,  time.  If 
additional  spatial  information  were  available,  then  each  vector  would  be  tagged  with  a  3- 
dimensional  variable,  namely  (time,  latitude,  longitude). 

Let  t  9t,:}  for  i  =  1,2, ...rN  represent  the  data  set  where  each  x.-  is  associated  with  a 

vector  yt  E  .  We  outline  the  algorithm  for  small  d  below;  the  next  section  outlines  the 

algorithm  for  larger  values  of  d .  The  basic  construction  of  MSVD  involves  imposing  a  dyadic 
grid  on  a  window  (hyper-cube)  of  appropriate  size  (based  on  application  needs).  Lor  static 
datasets,  this  may  be  the  entire  dataset  while  for  streaming  datasets,  it  could  be  a  sliding  or  non¬ 
overlapping  window  of  a  specific  size.  Without  loss  of  generalization,  assume  that  yt  resides  in 
the  unit  hypercube  [0,1] ri. 
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1.  Impose  a  dyadic  grid  on  the  unit  hypercube  [0,1] ri  up  to  scale  S.  Define  the  interval 


I. 


[0,1] d  where  the  i-th  dimension  of  4 is  in  for 

h  =  0,1/  (2J  —  l)  and  s  =  1,2 . 5. 

2.  Define  X, .-  , ,  =  \x,  |y,  6  L-  f  ]  as  the  subset  of  data  points  in  the  interval 


l 


S  j1  ^..j1  n  ■  .  £  * 


3.  Construct  the  matrix  M, 


using  the  data  points  in  X3>  - 


with  column  size  n. 


The  number  of  rows  is  equals  to  the  number  of  points  in  X^ . 

4.  Compute  the  SVD  for  _  =  U  °  E  °  V Store  the  singular  values  and  singular 

vectors  {(.(Zp  <j2,  ... ,  <Jk),  vlr v2, t?k}  where  is  a  n-dimensional  vector. 

5.  Repeat  steps  2  through  4  for  all  scales  and  intervals. 

6.  For  any  given  interval  at  the  finest  scale,  there  are  exactly  (5  —  1)  intervals  (one  at  each 
scale)  that  contain  it.  The  corresponding  sets  of  singular  values  and  vectors  completely 
characterize  the  data  cloud  for  that  interval. 


The  dyadic  tree  of  singular  values  and  vectors  constitute  the  MSVD  of  the  dataset. 

4.1.2  High  Number  of  Dimensions  (greater  than  3) 

The  primary  issue  with  higher  dimensions  is  the  exponential  growth  in  the  number  of  intervals  to 
be  processed.  Further,  the  actual  measurement  data  is  kept  separate  from  the  construction  of  the 
dyadic  grid  in  the  scheme  described  above.  In  the  general  setup,  consider  any  multivariate 
dataset  normalized  approximately  to  the  unit  ball.  The  objective  is  to  create  multiscale 

characterizations  by  considering  balls  of  sizes  1,7,7,-, ...  around  each  point  in  the  dataset.  The 

2  4  E 

rest  of  the  construction  is  similar  as  described  earlier.  Briefly,  construct  the  SVD  for  the  points  in 
each  ball  to  obtain  the  MSVD. 

For  very  large  datasets,  we  will  use  the  randomized  approximate  nearest  neighbors  algorithm 
(defined  in  the  earlier  technical  report  ISRN  TELCORDIA— 2011 -03+PR-0GARAU)  to  obtain 
random  samples  of  points  contained  in  balls  at  multiple  scales.  This  provides  a  rapid  way  to 
construct  the  MSVD  tree  with  the  desired  scaling  behavior. 


Use  or  disclosure  of  data  contained  on  this  sheet  is  subject  to  restrictions  on  the  title  page  of  this  report. 
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4.2  Deliverables  /  Milestones 


Date 

Deliverables  /  Milestones 

Status 

Oct  2010 

Progress  report  for  period  1,  1st  quarter 

Jan  2011 

Progress  report  for  period  1 ,  2nd  quarter  /  complete  randomized  matrix  decompositions  task 

Apr  2011 

Progress  report  for  period  1,  3rd  quarter  /  complete  approximate  nearest  neighbors  task 

V' 

Jul  2011 

Progress  report  for  period  1 ,  4th  quarter  /  complete  experiments  -  part  1 

V7 

Oct  2011 

Progress  report  for  period  2,  1st  quarter 

Jan  2012 

Progress  report  for  period  2,  2nd  quarter  /  complete  multiscale  SVD  task 

v7 

Apr  2012 

Progress  report  for  period  2,  3rd  quarter 

Jul  2012 

Progress  report  for  period  2,  4th  quarter  /  complete  experiments  -  part  2 

Oct  2012 

Progress  report  for  period  3,  1st  quarter 

Jan  2013 

Progress  report  for  period  3,  2nd  quarter  /  complete  multiscale  Heat  Kernel  task 

Apr  2013 

Progress  report  for  period  3,  3rd  quarter 

Jul  2013 

Final  project  report  +  software  +  documentation  on  CDROM  /  complete  experiments  -  part  3 

Use  or  disclosure  of  data  contained  on  this  sheet  is  subject  to  restrictions  on  the  title  page  of  this  report. 
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5  Results  and  Discussion 


There  are  no  benchmarks  or  experimental  results  to  report  for  this  quarter. 
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6  Conclusions 


The  project  is  on  track  with  completed  design  of  the  multiscale  SVD  algorithms  along  with  an 
initial  implementation.  We  will  continue  with  algorithmic  improvements  and  experimentation 
using  the  developed  algorithms  in  the  next  quarter. 

No  problems  are  currently  anticipated. 


Use  or  disclosure  of  data  contained  on  this  sheet  is  subject  to  restrictions  on  the  title  page  of  this  report. 
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